Feature selection for imbalanced data with deep sparse autoencoders ensemble

نویسندگان

چکیده

Class imbalance is a common issue in many domain applications of learning algorithms. Oftentimes, the same domains it much more relevant to correctly classify and profile minority class observations. This need can be addressed by feature selection (FS), that offers several further advantages, such as decreasing computational costs, aiding inference interpretability. However, traditional FS techniques may become suboptimal presence strongly imbalanced data. To achieve advantages this setting, we propose filtering algorithm ranking importance on basis reconstruction error deep sparse autoencoders ensemble (DSAEE). We use each DSAE trained only majority reconstruct both classes. From analysis aggregated error, determine features where presents different distribution values w.r.t. overrepresented one, thus identifying most discriminate between two. empirically demonstrate efficacy our experiments, simulated high-dimensional datasets varying sample size, showcasing its capability select generalizable class, outperforming other benchmark methods. also briefly present real application radiogenomics, methodology was applied successfully.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Deep Autoencoders for Feature Extraction with Educational Data

The goal of this paper is to describe methods for automatically extracting features for student modeling from educational data, and students’ interaction-log data in particular, by training deep neural networks with unsupervised training. Several different types of autoencoder networks and structures are discussed, including deep neural networks, recurrent neural networks, variational autoencod...

متن کامل

Modular Autoencoders for Ensemble Feature Extraction

We introduce the concept of a Modular Autoencoder (MAE), capable of learning a set of diverse but complementary representations from unlabelled data, that can later be used for supervised tasks. The learning of the representations is controlled by a trade off parameter, and we show on six benchmark datasets the optimum lies between two extremes: a set of smaller, independent autoencoders each w...

متن کامل

MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection

Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...

متن کامل

UDSFS: Unsupervised deep sparse feature selection

In this paper, we focus on unsupervised feature selection. As we have known, the combination of several feature units into a whole feature vector is broadly adopted for effective object representation, which may inevitably includes some irrelevant/redundant feature units or feature dimensions. Most of the traditional feature selection models can only select the feature dimensions without concer...

متن کامل

Unsupervised feature selection for sparse data

Feature selection is a well-known problem in machine learning and pattern recognition. Many high-dimensional datasets are sparse, that is, many features have zero value. In some cases, we do not known the class label for some (or even all) patterns in the dataset, leading us to semi-supervised or unsupervised learning problems. For instance, in text classification with the bag-of-words (BoW) re...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Statistical Analysis and Data Mining

سال: 2021

ISSN: ['1932-1864', '1932-1872']

DOI: https://doi.org/10.1002/sam.11567